Next: Miscellaneous, Previous: Sequences, Up: Top [Contents][Index]
Marshall Rose once wrote a paper on MH entitled, How to process 200 messages a day and still get some real work done. This chapter could be entitled, How to process 1000 spams a day and still get some real work done.
We use the terms junk mail and spam interchangeably for any unwanted message which includes spam, viruses, and worms. The opposite of spam is ham. The act of classifying a sender as one who sends junk mail is called blacklisting; the opposite is called whitelisting.
Display cheat sheet for the commands of the current prefix
in minibuffer (mh-prefix-help).
Whitelist range as ham
(mh-junk-whitelist).
mh-spamassassin-identify-spammersIdentify spammers who are repeat offenders.
The following table lists the options from the ‘mh-junk’ customization group.
mh-junk-background If on, spam programs are run in background (default: ‘off’).
mh-junk-disposition Disposition of junk mail (default: ‘Delete Spam’).
mh-junk-program Spam program that MH-E should use (default: ‘Auto-detect’).
The following option in the ‘mh-sequences’ customization group is also available.
mh-whitelist-preserves-sequences-flag
On means that sequences are preserved when messages are whitelisted (default: ‘on’).
The following hooks are available.
mh-blacklist-msg-hook Hook run by J b
(mh-junk-blacklist) after marking each message
for blacklisting (default: nil).
mh-whitelist-msg-hook Hook run by J w
(mh-junk-whitelist) after marking each message
for whitelisting (default
‘nil’).
The following faces are available.
mh-folder-blacklisted Blacklisted message face.
mh-folder-whitelisted Whitelisted message face
MH-E depends on SpamAssassin, bogofilter, or SpamProbe to throw the dreck away. This chapter describes briefly how to configure these programs to work well with MH-E and how to use MH-E’s interface that provides continuing education for these programs.
The default setting of the option mh-junk-program
is ‘Auto-detect’ which means that MH-E
will automatically choose one of SpamAssassin, bogofilter, or
SpamProbe in that order. If, for example, you have both
SpamAssassin and bogofilter installed and you want to use
bogofilter, then you can set this option to
‘Bogofilter’.
The command J b (mh-junk-blacklist)
trains the spam program in use with the content of the range (see
Ranges) and then handles the
message(s) as specified by the option
mh-junk-disposition. By default, this option is set
to ‘Delete Spam’ but you can also
specify the name of the folder which is useful for building a
corpus of spam for training purposes.
In contrast, the command J w
(mh-junk-whitelist) reclassifies a range of messages
(see Ranges) as ham if it were
incorrectly classified as spam. It then refiles the message into
the +inbox folder.
If a message is in any sequence (except
‘Previous-Sequence:’ and
‘cur’) when it is whitelisted, then it
will still be in those sequences in the destination folder. If
this behavior is not desired, then turn off the option
mh-whitelist-preserves-sequences-flag.
By default, the programs are run in the foreground, but this
can be slow when junking large numbers of messages. If you have
enough memory or don’t junk that many messages at the same
time, you might try turning on the option
mh-junk-background. 54
The following sections discuss the various counter-spam measures that MH-E can work with.
SpamAssassin is one of the more popular spam filtering programs. Get it from your local distribution or from the SpamAssassin web site.
To use SpamAssassin, add the following recipes to ~/.procmailrc:
PATH=$PATH:/usr/bin/mh MAILDIR=$HOME/`mhparam Path` # Fight spam with SpamAssassin. :0fw | spamc # Anything with a spam level of 10 or more is junked immediately. :0: * ^X-Spam-Level: .......... /dev/null :0: * ^X-Spam-Status: Yes spam/.
If you don’t use spamc, use
‘spamassassin -P -a’.
Note that one of the recipes above throws away messages with a score greater than or equal to 10. Here’s how you can determine a value that works best for you.
First, run ‘spamassassin -t’ on every
mail message in your archive and use gnumeric to
verify that the average plus the standard deviation of good mail
is under 5, the SpamAssassin default for “spam”.
Using gnumeric, sort the messages by score and
view the messages with the highest score. Determine the score
which encompasses all of your interesting messages and add a
couple of points to be conservative. Add that many dots to the
‘X-Spam-Level:’ header field above to
send messages with that score down the drain.
In the example above, messages with a score of 5–9 are set aside in the ‘+spam’ folder for later review. The major weakness of rules-based filters is a plethora of false positives so it is worthwhile to check.
If SpamAssassin classifies a message incorrectly, or is
unsure, you can use the MH-E commands J b
(mh-junk-blacklist) and J w
(mh-junk-whitelist).
The command J b (mh-junk-blacklist)
adds a ‘blacklist_from’ entry to
~/spamassassin/user_prefs, deletes the message, and
sends the message to the Razor, so that others might not see this
spam. If the sa-learn command is available, the
message is also recategorized as spam.
The commandJ w (mh-junk-whitelist)
adds a ‘whitelist_from’ rule to
‘~/.spamassassin/user_prefs’. If the
sa-learn command is available, the message is also
recategorized as ham.
Over time, you’ll observe that the same host or domain
occurs repeatedly in the
‘blacklist_from’ entries, so you might
think that you could avoid future spam by blacklisting all mail
from a particular domain. The utility function
mh-spamassassin-identify-spammers helps you do
precisely that. This function displays a frequency count of the
hosts and domains in the
‘blacklist_from’ entries from the last
blank line in ~/.spamassassin/user_prefs to the end
of the file. This information can be used so that you can replace
multiple ‘blacklist_from’ entries with a
single wildcard entry such as:
blacklist_from *@*amazingoffersdirect2u.com
In versions of SpamAssassin (2.50 and on) that support a
Bayesian classifier, J b
(mh-junk-blacklist) uses the program
sa-learn to recategorize the message as spam.
Neither MH-E, nor SpamAssassin, rebuilds the database after
adding words, so you will need to run ‘sa-learn
--rebuild’ periodically. This can be done by adding
the following to your crontab:
0 * * * * sa-learn --rebuild > /dev/null 2>&1
Bogofilter is a Bayesian spam filtering program. Get it from your local distribution or from the bogofilter web site.
Bogofilter is taught by running:
bogofilter -n < good-message
on every good message, and
bogofilter -s < spam-message
on every spam message. This is called a full training; three other training methods are described in the FAQ that is distributed with bogofilter. Note that most Bayesian filters need 1000 to 5000 of each type of message to start doing a good job.
To use bogofilter, add the following recipes to ~/.procmailrc:
PATH=$PATH:/usr/bin/mh MAILDIR=$HOME/`mhparam Path` # Fight spam with Bogofilter. :0fw | bogofilter -3 -e -p :0: * ^X-Bogosity: Yes, tests=bogofilter spam/. :0: * ^X-Bogosity: Unsure, tests=bogofilter spam/unsure/.
If bogofilter classifies a message incorrectly, or is unsure,
you can use the MH-E commands J b
(mh-junk-blacklist) and J w
(mh-junk-whitelist) to update bogofilter’s
training.
The Bogofilter FAQ suggests that you run the following occasionally to shrink the database:
bogoutil -d wordlist.db | bogoutil -l wordlist.db.new mv wordlist.db wordlist.db.prv mv wordlist.db.new wordlist.db
The Bogofilter tuning HOWTO describes how you can fine-tune bogofilter.
SpamProbe is a Bayesian spam filtering program. Get it from your local distribution or from the SpamProbe web site.
To use SpamProbe, add the following recipes to ~/.procmailrc:
PATH=$PATH:/usr/bin/mh MAILDIR=$HOME/`mhparam Path` # Fight spam with SpamProbe. :0 SCORE=| spamprobe receive :0 wf | formail -I "X-SpamProbe: $SCORE" :0: *^X-SpamProbe: SPAM spam/.
If SpamProbe classifies a message incorrectly, you can use the
MH-E commands J b (mh-junk-blacklist) and
J w (mh-junk-whitelist) to update
SpamProbe’s training.
There are a couple of things that you can add to ~/.procmailrc in order to filter out a lot of spam and viruses. The first is to eliminate any message with a Windows executable (which is most likely a virus). The second is to eliminate mail in character sets that you can’t read.
PATH=$PATH:/usr/bin/mh MAILDIR=$HOME/`mhparam Path` # # Filter messages with w32 executables/virii. # # These attachments are base64 and have a TVqQAAMAAAAEAAAA//8AALg # pattern. The string "this program cannot be run in MS-DOS mode" # encoded in base64 is 4fug4AtAnNIbg and helps to avoid false # positives (Roland Smith via Pete from the bogofilter mailing list). # :0 B: * ^Content-Transfer-Encoding:.*base64 * ^TVqQAAMAAAAEAAAA//8AALg * 4fug4AtAnNIbg spam/exe/. # # Filter mail in unreadable character sets (from the Bogofilter FAQ). # UNREADABLE='[^?"]*big5|iso-2022-jp|ISO-2022-KR|euc-kr|gb2312|ks_c_5601-1987' :0: * 1^0 $ ^Subject:.*=\?($UNREADABLE) * 1^0 $ ^Content-Type:.*charset="?($UNREADABLE) spam/unreadable/. :0: * ^Content-Type:.*multipart * B ?? $ ^Content-Type:.*^?.*charset="?($UNREADABLE) spam/unreadable/.
Note that the option mh-junk-background is used
as the display argument in the call to
call-process. Therefore, turning on this option
means setting its value to ‘0’. You
can also set its value to ‘t’ to
direct the programs’ output to the *MH-E
Log* buffer; this may be useful for debugging.
Next: Miscellaneous, Previous: Sequences, Up: Top [Contents][Index]